Statistical Approaches to Patent Translation for PatentMT - Experiments with Various Settings of Training Data

نویسندگان

  • Yuen-Hsien Tseng
  • Chao-Lin Liu
  • Chia-Chi Tsai
  • Jui-Ping Wang
  • Yi-Hsuan Chuang
  • James Jeng
چکیده

This paper describes our experiments and results in the NTCIR-9 Chinese-to-English Patent Translation Task. A series of open source software were integrated to build a statistical machine translation model for the task. Various Chinese segmentation, additional resources, and training corpus preprocessing were then tried based on this model. As a result, more than 20 experiments were conducted to compare the translation performance. Our current results show that 1) consistent segmentation between the training and testing data is important to maintain the performance; 2) sufficient number of good quality bilingual training sentences is more helpful than additional bilingual dictionaries; and 3) the translation effectiveness in BLEU values doubles as the number of bilingual training sentences at the level of 100,000 doubles.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Machine Translation System for Patent Documents Combining Rule-based Translation and Statistical Postediting Applied to the NTCIR-10 PatentMT Task

In this article, we describe system architecture, preparation of training data and discussion on experimental results of the EIWA group in the NTCIR-10 Patent Translation Task. Our system is combining rule-based machine translation and statistical postediting. The thing about our new system compared with NTCIR-9 PatentMT task is to implement automatic selecting method from multiple translations...

متن کامل

The NiuTrans Machine Translation System for NTCIR-9 PatentMT

This paper describes the NiuTrans system developed by the Natural Language Processing Lab at Northeastern University for the NTCIR-9 Patent Machine Translation task (NTCIR-9 PatentMT). We present our submissions to the two tracks of NTCIR-9 PatentMT, and show several improvements to our phrase-based Statistical MT engine, including: a hybrid reordering model, large-scale language modeling, and ...

متن کامل

UQAM's System Description for the NTCIR-10 Japanese and English PatentMT Evaluation Tasks

This paper describes the development of a Japanese-English and English-Japanese translation system for the NTCIR-10 Patent MT tasks. The MT system is based on the provided training data and Moses decoder. We report our first attempt on statistical machine translation for these pairs of languages and the Patent domain.

متن کامل

BBN's Systems for the Chinese-English Sub-task of the NTCIR-9 PatentMT Evaluation

This paper describes the work we conducted for building a statistical machine translation (SMT) system for the ChineseEnglish sub-task of the NTCIR-9 patent machine translation (MT) evaluation [17]. We first applied the various techniques on patent data that we had developed for improving SMT performance on other types of data. Our results show that most of the techniques work on patent documen...

متن کامل

System Description of BJTU-NLP SMT for NTCIR-9 PatentMT

This paper presents the overview of statistical machine translation systems that BJTU-NLP developed for the NTCIR-9 Patent Machine Translation Task (NTCIR-9 PatentMT). We compared the performance between phrase-based translation model and factored translation model in our Patent SMT of Chinese to English and English to Japanese. Factored translation model was proposed as an extended phrase-base...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011